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SUMMARY 

We  suppose  that  our  observations  can  be  decomposed  into  a  fixed  signal  plus  random 
noise,  where  the  noise  is  modelled  as  a  particular  stationary  Gaussian  random  field  in  N - 
dimensional  Euclidean  space.  The  signal  has  the  form  of  a  known  function  centered  at  an 
unknown  location  and  multiplied  by  an  unknown  amplitude,  and  we  are  primarily  interested 
in  a  test  to  detect  such  a  signal.  There  are  many  examples  where  the  signal  scale  or  width 
is  assumed  known,  and  the  test  is  based  on  maximising  a  Gaussian  random  field  over  all 
locations  in  a  subset  of  .V-dimensional  Euclidean  space.  The  novel  feature  of  this  work  is 
that  the  width  of  the  signal  is  also  unknown  and  the  test  is  based  on  maximising  a  Gaussian 
random  field  in  N  + 1 -dimensions,  N  dimensions  for  the  location  plus  one  dimension  for  the 
width.  Two  convergent  approaches  are  used  to  approximate  the  null  distribution:  one  based 
on  the  method  of  Knowles  and  Siegmund  (1989),  which  uses  a  version  of  Weyl’s  (1939)  tube 
formula  for  manifolds  with  boundaries,  and  the  other  based  on  some  recent  work  by  Worsley 
(1993b),  which  uses  the  Hadwiger  characteristic  of  excursion  sets  as  introduced  by  Adler 
(1981).  Finally  we  compare  the  power  of  our  method  with  one  based  on  a  fixed  but  perhaps 
incorrect  signal  width. 
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1  Introduction 

Given  a  fixed  value  to  €  C,  non-negative  and  positive  <ro,  assume  that  the  random  field 
{Z(t),t  €  IRn}  satisfies 

dZ( t)  =  ^oN/2f[aol(t  -  t0)]dt  +  d\V( t).  (1.1) 

Here  C  is  a  subset  of  N  dimensional  Euclidean  space,  e.g.,  a  rectangle,  /  is  a  square  integrable 
function,  which  without  loss  of  generality  can  be  assumed  to  satisfy 

J  f(t)2dt  =  1,  (1.2) 

and  W  is  Gaussian  white  noise.  The  unknown  parameter  (<f,  t0,  <tq)  represents  the  amplitude, 
location  and  scale  of  the  signal,  which  is  usually  positive,  symmetric  and  unimodal.  We  shall 
be  primarily  interested  in  testing  the  hypothesis  of  no  signal,  i.e.,  that  £  =  0,  and  to  a  lesser 
extent  in  estimating  the  value  of  to  when  £  >  0.  A  case  of  special  interest  is  the  Gaussian 
case,  where 

/(t)  =  r-A"‘exp(-||t||!/2).  (1.3) 

Although  it  is  unrealistic  to  suppose  that  Z  is  defined  throughout  IR'V,  this  assumption 
allows  us  to  avoid  problems  of  edge  effects  and  may  be  a  reasonable  approximation  when 
(X0  is  small  compared  to  the  distance  between  to  and  the  boundary  of  the  region  where  Z  is 
defined. 

Models  of  at  least  approximately  this  form  have  been  used  in  a  number  of  scientific 
contexts.  One  is  the  model  of  Worsley  et  af.  (1992)  of  blood  flow  changes  in  the  human 
brain  observed  via  positron  emission  tomography;  here  N  —  3  and  C  is  the  brain.  Another 
is  the  model  of  Rabinowitz  (1993)  for  the  geographical  clustering  of  disease  incidence,  where 
N  =  2.  Focus  for  the  current  research  came  originally  from  questions  raised  by  John  H. 
Cobb  of  the  Physics  Institute  of  the  University  of  Oxford,  who  is  involved  in  searching  a 
portion  of  the  sky  for  evidence  of  a  point  source  of  high  activity  against  a  background  of 
Poisson,  hence  approximately  Gaussian,  white  noise;  here  again  N  =  2. 

Given  an  N  dimensional  kernel  k,  we  define  the  Gaussian  process 

A-(t,<7)  =  <t~n'2  j  *[<r*(h  -  t)]dZ(h).  (1.4) 

Assuming  that  the  value  of  <ro  is  known  to  lie  in  the  interval  [cri,o-2],  we  consider  the  test  of 
£  =  0  that  rejects  for  large  (positive)  values  of 

Xm*x  =  max  A(t,cr),  (1.5) 

tftr 

where  the  max  extends  over  t  €  C  and  <T\  <  a  <  <r2.  If  the  shape  of  the  signal,  i.e.,  /,  is 
known,  we  can  take  k  =  /.  It  can  be  shown  (see  below)  that  the  log  likelihood  function  is 

{X(t0,a0)-£72,  (!■«) 
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so  the  test  defined  by  (1.5)  is  the  likelihood  ratio  test.  This  is  more  or  less  obvious  in  the 
case  that  one  actually  observes  the  process  Z  defined  by  (1.1).  However,  in  many  cases 
one  can  only  observe  the  smoothed  process  (1.4).  Then  this  statement  is  a  consequence  of 
the  following  general  result.  Let  (T(r),  r  €  5}  be  a  Gaussian  random  field  with  covariance 
function  /2(rj,  r2)  and  mean  value  function  of  the  form  E[K(r)j  =  £R(r, r0),  parameterized 
by  ro  €  5  and  £  6  1R.  Let  Q(,r0  denote  the  distribution  of  Y.  Then  the  log  likelihood  ratio  is 

logdQtjJdQo  =  £Y( T0)  -  Z2R( r0,r0)/2. 

If  5  is  finite,  this  is  an  easy  direct  calculation  involving  the  multivariate  normal  density  or  an 
exercise  in  linear  models  (see  Worsley,  1993a).  For  5  a  complete,  separable  metric  space  the 
theory  of  reproducing  kernel  Hilbert  spaces  delivers  the  goods  (cf.  Parzen,  1961,  Theorem 
7A). 

Sections  3  and  4  of  this  paper  are  concerned  with  approximate  evaluation  of  the  signifi¬ 
cance  level  of  the  test  defined  by  (1.5),  i.e.,  the  probability  when  £  =  0  that  Am**  exceeds  a 
constant  threshold,  say  6.  First  order  approximations  for  this  can  easily  be  derived  from  the 
results  going  back  to  Belyaev  and  Pitaberg  (1972)  (see  Adler,  1981,  Theorem  6.9.1,  p.  160) 
who  give  the  the  following.  Suppose  Y(t)  is  a  zero  mean,  unit  variance,  stationary  random 
field  defined  on  an  interval  S  C  IRn,  and  let  =  maxres  K(r)  and 

F(b)  =  \S\det[V*x(dY/dr)]1'2bn-lm/Wn'2,  (1.7) 

where  |S|  is  the  Lebesgue  measure  of  S  and  ^>(6)  =  (2;r)-1/2exp(— 62/2).  Then 

limP{rin«>6}/F(5)  =  l. 

0  *00 

In  our  case  n  =  N  +  1,  K(r)  =  X(t,a),  S  is  the  cartesian  product  of  the  the  region  C  and 
the  interval  [<ri,<72].  It  only  remains  to  find  the  variance  matrix  of  dY/d r,  which  in  our  case 
depends  on  a,  so  the  root  determinant  in  (1.7)  should  be  replaced  by  its  average  over  5. 

In  practice,  however,  F{b)  can  be  a  poor  approximation  to  the  exceedence  probability  of 
Xn»».  In  typical  applications  to  medical  images  the  range  of  a  is  small,  and  since  F(b)  is 
proportional  to  the  volume  of  5,  then  F(b)  approaches  zero  as  <r2  —  ox  approaches  zero.  This 
is  obviously  unsatisfactory;  a  better  approximation  can  be  obtained  by  fixing  a  at  say  <7j  and 
repeating  the  above  argument  for  n  =  N.  The  same  phenomenon  occurs  if  we  shrink  the 
volume  of  5  to  zero;  the  approximation  F(b)  approaches  zero  and  yet  a  better  approximation 
can  be  obtained  by  fixing  t  and  applying  n  =  1  dimensional  theory  (see  Leadbetter,  Lindgren 
and  Rootzen,  1983).  Clearly  what  is  required  is  a  higher  order  approximation,  and  we  offer 
two  approaches. 

Our  first  approach  in  section  3  to  the  distribution  of  is  based  on  the  Euler,  or 
more  specifically  the  Hadwiger,  characteristic  of  excursion  sets.  The  second  approach  in 
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section  4  is  concerned  with  an  extension  of  the  method  of  Knowles  and  Siegmund  (1989), 
which  involves  a  version  of  Weyl’s  (1939)  tube  formula  for  manifolds  with  boundaries.  These 
methods  involve  a  substantia!  amount  of  geometric  machinery  to  calculate  a  quantity  related, 
but  not  equal,  to  the  probability  of  interest.  Thus  it  is  reassuring  that  the  two  methods  give 
essentially  the  same  result  and  illuminate  complementary  aspects  of  the  calculation.  An 
interesting  aspect  of  the  method  of  tubes  is  that  for  the  relevant  Gaussian  fields  an  important 
associated  manifold  has  constant  negative  curvature. 

Section  5  is  concerned  with  the  power  of  the  likelihood  ratio  test  We  also  compare  the 
power  of  the  likelihood  ratio  test,  which  involves  a  data  dependent  estimate  of  <70,  as  indicated 
in  (1.5),  with  the  analogous  test  based  on  an  arbitrary  fixed  choice  of  cr,  which  in  general 
differs  from  <To-  Section  6  contains  a  simple  example  and  an  application  to  positron  emission 
tomography  images,  and  Section  7  contains  some  additional  discussion. 

2  Preliminaries 

Under  the  null  hypothesis  of  no  signal  we  can  write 

X(t,«r)  =  a~N'2  j  *(<r-1(h  -  t)]dW(h) 

where  fc(h),  h  €  R/V,  is  a  smooth  kernel  with  f  &(h)2dh  =  1;  sufficient  smoothness  is 
discussed  below.  Then  X(t,a)  is  a  zero  mean,  unit  variance  Gaussian  random  field  with 
correlation  function 

Cor[A'(t1,<71),  X(t2,a2)]  =  J  i[<7,-1(h  -  tj )]Ar[<r^1(h  -  t2)]dh. 

Note  that  for  fixed  a ,  X(t,cr)  is  stationary  in  t  and  for  fixed  t,  X(t,cr)  is  stationary  in  log  <7. 
Introduce  s  =  —  log<r  and  with  a  slight  abuse  of  notation  let  X(t,  s)  =  X(t,  a).  Then 

Cor[X(t1,s1),X(t2,a2)]  =  e*<“+*»>'a  J  fc[(h  -  t1)e*1]Jk((h  -  t2)e*’]dh  (2.1) 

=  J  k[h  +  (t2  -  t1)el']k[he’’-t')cfa  (2.2) 

but  note  that  X(t,s)  is  not  stationary  in  (t,s). 

We  shall  need  the  joint  distribution  of  the  derivatives  of  the  field  up  to  second  order. 
Suppose  jfc(h)  =  *(— h).  Let  X  =  X(t,s),  X,  =  dX/ds,  X  =  dX/di  and  D  =  d2X/dtdt\ 
where  prime  denotes  transpose.  At  a  fixed  point  ( X ,  X„X,  D)  is  multivariate  Gaussian  with 
zero  mean  and  variance  matrix 


X  \ 

f  1 

0 

0 

— e2*A  > 

X. 

0 

K 

0 

-e2'A 

X 

0 

0 

e2*A 

0 

D  j 

^  -e2*A 

— e2*A 

0 

^  / 
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where 


A  =  j  k(h)k'(h)dh.  «  =  J [h'k(h)  +  (N/2)k]2dh. 

k(h)  denotes  <9/c(h)/<9h  and  e  is  such  that  the  covariance  between  the  (i.j)  element  of  D 
and  the  ( k,l )  element  of  D  is  symmetric  in  i,j,  k.  /,  1  <  i,j,kj  <  N  (see  Adler,  1981,  p. 
114).  For  the  special  case  of  a  Gaussian  kernel  k( h)  =  /(h)  from  (1.3)  we  have  A  =  1/2, 
where  I  is  the  identity  matrix,  and  n  =  N/2. 

Deriving  these  covariances  takes  some  trial  and  error;  we  illustrate  the  method  for 
Cov(D,  X»).  From  (2.1)  we  have 

J-Cor[X(t1,a,i.X(ta,32))  =  J^(h-tl)eti]k[(h-t2)e^)dh 

=  _c-v<*«+*»>/2+'«  j  k[he,1)Ar[(h  4-  ti  —  t2)e*a]dh, 

^rCor[.Y(t„s1...V(t2,s2)]  =  — e('v/2+1>(ll+,j) |k[he4>]k'[(h  +  t,  -  ta)e*»]dh<2.3) 

Differentiating  (2.3)  with  respect  to  s2,  setting  ti  =  t2  and  sj  =  s2  =  s.  and  changing  the 
variable  of  integration  we  get 

Cov(D.  X.)  =  -t2i  j  k(h)(h'k(h)  +  (N/2  +  l)k'(h)]dh.  (2.4) 

Changing  the  variable  of  integration  in  (2.3)  first  gives 

5^Cor[A'(tI,S,),X(t!,3j)l  =  J  k(he«— lk'{h  +  (t,  -  t2)e»]rfh, 

then  differentiating  with  respect  to  s2,  setting  ti  =  t2  and  si  =  s2  =  s,  and  changing  the 
variable  of  integration  we  get 

Cov(D,  X,)  =  e2*  J [k(h)h  +  (N/2  -  l)k(h)]k'(h)dh.  (2.5) 

Taking  the  average  of  both  sides  of  (2.4)  and  (2.5),  and  noting  that  D  is  symmetric,  gives 

Cov(D,*,)  =  -e2‘J  k(h)k'(h)dh  =  -e2*A. 

The  conditions  we  shall  require  on  the  smoothness  of  the  kernel  k( h)  will  ensure  that 
realisations  of  K(r)  =  -Y(t,s)  have  almost  surely  continuous  derivatives  up  to  second  order. 
Exact  conditions  on  the  moduli  of  continuity  of  V'(r)  are  given  by  Adler  (1981),  Theorem 
5.2.2,  page  106.  A  simpler  sufficient  condition  for  Gaussian  fields,  using  Theorem  3.4.1  of 
Adler  (1981),  page  60.  is  the  following.  Let  F^(r)  be  the  second  derivative  of  Y(r)  with 
respect  to  components  i  and  j  of  r.  Then  we  shall  require  that  for  some  c  >  0  and  for  all 
ri,r2  €  5 

E([V',(r1)  -  K-jfr,)]2}  =  O(|log||r,  -rjlir11*0) 
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for  all  ri,r2  €  S,  and  all  i,j  =  1 , . . .  ,n.  This  condition  is  satisfied  if.  for  example,  all  third 
derivatives  of  y(r)  have  finite  variance,  which  in  turn  is  assured  if  the  integral  of  the  product 
of  any  pair  of  third  derivtives  of  A:(h)  times  sixth  powers  of  components  of  h  is  finite.  This 
latter  condition  is  met.  for  example,  by  the  Gaussian  kernel  (1.3). 

3  Hadwiger  characteristic  approach 

3.1  Motivation 

Let  Ab  =  {r  :  K(r)  >  6}  be  the  excursion  set  of  K(r)  above  b,  and  let  \{A(,)  be  the  Euler 
or  Euler- Poincare  characteristic  of  5  D  Ai>.  The  Euler  characteristic  counts  the  number 
of  connected  components  of  a  set,  minus  the  number  of  “holes.”  As  the  threshold  level  b 
increases  Adler  (1981)  Theorem  6.4.1,  p.  136.  shows  that  the  “holes”  in  At,  tend  to  disappear 
and  that  we  are  left  with  isolated  regions  each  of  which  contains  just  one  local  maximum. 
Thus  for  large  b  the  presence  of  holes  is  a  rare  occurrence  and  the  Euler  characteristic 
approaches  the  number  of  local  maxima  above  b.  For  even  larger  b  near  the  global  maximum 
I'mu  the  Euler  characteristic  takes  the  value  0  if  <  b  and  1  if  K,.„  >  b.  Hasofer  (1978) 
shows  that 

>  6}  «  P{X(Ab)  >  1}  *  E[X(Ab)\  (3.1) 

as  P{x(A{,)  >  1}  —►  0  for  6  — ♦  oo,  and  so  the  expected  Euler  characteristic  approximates 
the  exceedence  probability  of  K,...  The  advantage  of  the  Euler  characteristic  is  that  its 
expectation  can  be  found  exactly;  the  only  remaining  point  is  how  well  this  approximates 
PIKn^x  >  b). 

Some  justification  for  this  is  as  follows.  Let  /&  =  (K,..  >  b)  be  the  indicator  function  for 
Km..,  where  a  logical  expression  in  parentheses  takes  the  value  one  if  the  expression  is  true 
and  zero  otherwise  (Knuth,  1992),  so  that  P{K.»«  >  6}  =  E (/*).  Provided  5  is  connected  and 
itself  has  no  “holes”  then  the  Euler  characteristic  of  the  excursion  set  \{Ab)  approximates 
the  indicator  function  h  both  for  large  b  (as  shown  above),  and  for  small  6  (since  Ab  =  S 
for  b  small  enough,  and  x($)  =  1)-  Moreover  this  property  holds  for  any  value  of  n,  so  that 
if  the  region  S  is  “squashed”  to  form  a  lower-dimensional  manifold  embedded  in  IR",  then 
x(Aft)  continues  to  approximate  /&  in  the  same  way.  In  particular,  if  n  =  1,  then  the  two  are 
identical  for  all  b.  Thus  the  Euler  characteristic  has  the  correct  behaviour  for  all  shapes  of 
5. 


3.2  Definition 

We  now  define  a  more  convenient  characteristic,  the  Hadwiger  charactersitc,  which  is  de¬ 
fined  on  different  classes  of  sets  than  the  Euler  characteristic,  but  which  equals  the  Euler 
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charactersitic  within  the  domain  of  definition  of  both.  The  advantage  for  us  is  that  it  has 
an  iterative  definition,  given  below,  which  is  more  amenable  to  statistical  analysis. 

Let  C  be  a  compact  subset  of  IR.^  with  a  smooth  boundary,  let  /  =  [sj,s2]  be  the  closed 
interval  between  Si  and  s2  and  let  S  —  C  x  I.  Once  again  we  define  the  excursion  set  At  of 
A'(t.s)  above  a  threshold  b  to  be  the  set  of  points  in  S  where  A'(t,s)  exceeds  b: 

Ab=  {(t,s)€  S:  X(t,s)>  b}. 

Define  the  Hadwiger  characteristic  w{A)  of  a  basic  complex  A  C  S  iteratively  as  follows.  For 
N  =  0,  let  0(A)  be  the  number  of  disjoint  intervals  in  A.  For  N  >  0,  let 

=  EM'4  n  £*)  -  n  £.-)], 

u 

where  £u  =  C  x  {u}  and 

c»(y4  n  £u~)  =  lim^>(A  fi  £v). 

l’?U 

The  Hadwiger  characteristic  is  the  only  characteristic  which  satisfies  the  following  additivity 
property:  if  A,  B,  A  U  B  and  AD  B  are  basic  complexes  then 

rfr{A  U  B)  =  xp(A)  +  v{B)  —  xl>{A  fl  B). 

If  A'(t,s)  is  sufficiently  regular,  as  defined  by  Adler  (1981),  chapter  3,  then  the  excursion 
set  Ab  is  almost  surely  a  basic  complex. 

A  crucial  step  in  deriving  statistical  properties  of  excursion  characteristics  is  to  obtain  a 
point-set  representation  which  expresses  the  characteristic  in  terms  of  local  properties  of  the 
excursion  set  rather  than  global  properties  such  as  connectedness.  To  do  this,  let  0v  be  the 
contribution  to  the  point  set  representation  from  the  interior  of  5,  let  we  be  the  contribution 
from  dC  x  /,  the  “edges”  in  scale  space,  and  let  0b  be  the  contribution  from  £Si ,  the  “base” 
of  5.  Then  with  probability  one 

0(Afc)  =  0V  +  0E  +  0B. 

3.3  N  =  1 

Here  t,  X,  D  and  A  are  scalaxs,  and  set  A  =  A,  say.  For  t  €  dC,  let  X±_  be  the  derivative 
of  X  with  respect  to  t  in  the  inwards  direction  to  C.  Then 

0v  =  E(*«  >  0)l(D  <  0)  -  (D  >  0)](X  =  0)(X  =  6), 
s 

0E  =  E  (*'  >  <  °)(*  =  *)• 

dCxI 


We  can  evaluate  the  expectation  of  the  point  set  representations  following  the  methods 
used  to  prove  Theorem  5.1.1  of  Adler  (1981,  p.  95)  to  get 

E{th)  =  -|C|  r  E(A7D|X  =  0,A-  =  6)^(0.  b)ds. 

EM*)  =  z  r  <  o)ix = b]<f,(b)ds, 

ac 

where  <p\(x,x)  is  the  density  of  (X,  X).  Taking  expectations  first  over  D  conditional  on  X„ 
X  and  X  we  get 

-E(X,+D|X  =  0  .X  =  6)  =  e2,AE[X+(6  +  XJk)]  =  e2'\[bK1/2/(2ir)1/2  +  1/2] 
and  so  for  the  interior  of  5  we  have 

E(vv)  =  \C\  J’*  eads\1/2[bK1/2/(2Tr)112 +l/2]<j>(b)/(2i:)1/2 

=  |C|(e13  -  )(\K)l'2b<t>(b)/(2x)  4-  (|C|/2)(e*2  -  )Xl/2d(b)/(2ir)1'2. 

For  the  "edges,”  each  connected  component  of  C  contributes  two  points  to  dC,  each  with 
A'x  taken  in  opposite  directions;  since  (Xi  <  0)  +  (Xx  >  0)  =  1  and  the  number  of  such 
points  is  2 t/»(C),  then 

E(V’e)  =  V(C)(*  -  s,)*'/:V(6)/( 2i,y'\ 

For  the  “base,”  we  have  from  Worsley  (1993b), 

E(<M  =  |C|e«A>'V(6)/(2ir)>«  +  ^.(C)[l  -  *(»)], 

where  *h(6)  =  P{  A”  <  b]  —  Putting  these  together,  substuting  C\  —  e'*1 , 

<r2  =  e"*‘  for  the  limits  on  <7,  and  re-arranging,  we  have 

E[c(Afc)]  =  \C\(o;1  -  (t2-1)(Ak)1/266(6)/(2x) 

+  (\C\l2)(o;'  +o;l)\''26(b)/(2nf2 

+  XKC)  log(<72/<T1)«1/2<^(6)/ (2?r)1/2 

+  *(C)[  1  -  *(*)].  (3.2) 

3.4  N  >  1 

At  a  point  (t,  s)  €  dC  x  /,  let  tx  €  1RN  be  the  inside  normal  to  dC  and  let  the  columns  of 
(tx,  A),  A  an  N  x  N  -  1  matrix,  be  an  orthonormal  basis  for  IRN.  Let  Xx  =  txX  be  the 
derivative  of  X  normal  to  S,  let  Xt  =  A'X  be  the  N  —  1-vector  of  derivatives  of  X  tangent 
to  5,  and  let  Dt  =  A'DA  be  the  N  —  1  x  N  —  1  second  derivative  matrix  of  X  tangent  to  S. 
Let  cj  be  the  N  —  1  x  N  —  1  curvature  matrix  of  dC  at  t.  Finally  for  a  symmetric  matrix 
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M,  let  x(M)  take  the  value  +1  if  M  has  an  even  number  of  negative  eigen  values,  and  -1  if 
M  has  an  odd  number  of  negative  eigen  values,  including  multiplicities.  Then  generalising 
the  results  of  Adler(1981.  p.  84)  we  have 

0v  =  £(A%>0)X(D)(X  =  O)(*  =  6), 
s 

0e  =  ^(Xi>0)x(DT  +  cTA'i)(A\<0)(XT  =  0)(A'  =  6). 

dCxI 

We  evaluate  the  expectation  of  the  point  set  representations  following  the  methods  used 
to  prove  Theorem  5.1.1  of  Adler  (1981,  p.  95)  to  give 

E(^v)  =  (-1)N|C|  /1JE[A,+ det(D)|X  =  0,  A  =  6)^(0, 6)<fs, 

E(^e)  =  (  — l)""1  I’2  /  E(Aa+det(DT  +  ctA'1)(X1  <  0)|XT  =  0,A  =  b)6A(0,b)dA'tds, 

J  s\  J  dC 

where  oA{x,  x)  is  the  density  of  (A'X,X).  We  shall  evaluate  E(V>v)  bv  first  taking  expec¬ 
tations  over  D  conditional  on  Xs,  X  and  X.  Let  v  =  (1  +  l//c)1/2  and  let  He/v(x)  be  the 
Hermite  polynomial  of  order  N  in  x.  Following  the  arguments  of  Adler(1981,  p.  114)  we  get 

E[det(D)]  =  e2N,det(A)(— u)'vE{Hev[(6  +  XJ^/v)}.  (3.3) 

For  E(V’e),  we  shall  first  take  expectations  over  X±  conditional  on  Df,  Xs,  Xj  and  X.  Let 
Ax  =  A'AA,  so  that  the  distribution  of  Dj  =  Aj1/2DtA^1/2  is  invariant  under  rotations 
in  the  tangent  plane.  Then  letting  Ct  =  At1/2cjAt1/2  we  can  write 

det(Dx  +  ctAj.)  =  det(Ai)det(DT  +  CtXl). 

For  any  m  x  m  matrix  M,  define  detrj(M)  to  be  the  sum  of  the  determinants  of  all  j  x  j 
principal  minors  of  M,  1  <  j  <  m,  and  one  if  j  =  0.  Then  since  any  principal  minor  of  Dj 
has  the  same  distribution  we  can  expand  in  powers  of  X±.  to  get 

N- 1 

E{det(D}  +  CtXl)]  =  £  E[det(DT(Ar"1'j))]detrJ(c^)Xi. 

;=o 

where  Dj is  any  N  —  1  —  j  x  N  —  1  —  j  principal  minor  of  Dj.  Now  from  (3.3)  we 
have 

EfdetfD?*-1--0)]  =  +  X./k)/v)}. 

Since 

E[X{(X1  <  0)]  =  Var(X±)^(-l)220-1>'2r[(;  +  1)/2]/(2jt)1/2,  (3.4) 
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we  get 

E[det(DT +cTXJ.)(Xi  <  0))  =  (-t')A’"‘e2(‘V'l)4det(  AT) 

N-l 

x  ^  e-2j*r-JE{He/v-i-j[(6  +  Xj/fcJ/vDdetr^Cx) 

j=o 

x  Var(Xx)j/22‘J-1)/2r[0  +  l)/2]/(2r)1/2.  (3.5) 

In  principle  we  can  expand  (3.3)  and  (3.5)  in  powers  of  X,,  multiply  by  X*,  take  expectations 
using  (3.4),  and  find  a  general  expression  for  E[0(  A*,)].  The  algebra  is  very  tedious  so  we 
shall  now  consider  the  cases  of  most  interest  where  N  =  2  or  Ar  =  3  and  the  process  is 
isotropic  in  t  for  fixed  s. 

3.5  -V  =  2 

We  can  evaluate  (3.3)  by  expanding  and  using  (3.4): 

t-2E{X,+He2[(6+Xs/«)/f]}  =  E{X,+[(6+X4//c)2-(l  +  l//c)]}  =  K'»(b2 -1  +  1/k)/(2*)''2 +b, 

so  that,  substituting  and  integrating  over  s,  we  get 

E(tM  =  (|Cl/2)e2(iJ-Jl)det(A)1/2[tc1/2(52  -  1  +  l/x)/^)1'2  +  6]<M&)/(2*). 

It  is  hard  to  simplify  (3.5)  without  making  the  further  assumption  that  the  field  is  isotropic 
in  t  for  fixed  s,  so  that  we  can  write  A  =  AI.  Then  conditional  on  X„,  X  =  0  and  X  =  b 

E[det(DT  +  cTXx)(Xx  <  0)]  =  -e2'A(6  +  X./*)/2  -  cTe4A,/2/(27r)1/2 

so  that  conditional  on  X  =  0  and  X  —b 

E(X4+det(DT  +  cTXx)(Xx  <  0)]  =  -e2tX[bK1'2/{2ir)1f2  +  l/2]/2  -  cTeXA*)1/2/(2;r). 
Now  since  §ac  Cjdtx  =  2zrp(C)  we  get 

E(^)  =  r  {\dC\t’\ll2[bKll2l{2*)ll2  +  1/2J/2  +  xl>{C)Kl'2}miV*)ll2ds 

Jti 

=  {|0C|e*-‘  A1'2^1'2/^)1'2  +  1/2J/2  +  tfr(C)(s2  -  s,)«1/2}<^(6)/( 2*r),/2. 
Finally,  from  Worsley  (1993b),  we  have 

E(tfa)  =  |Cie2»«A6^(6)/(27r)  +  (|0C|/2)e*>  X1/2^(6)/(2tt )»/2  +  «,(C)(  1  -  $(6)). 
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Combining  these  and  substuting  C\  =  e~ai,  cr2  =  e  11  for  the  limits  on  a.  we  have 

E[0(At)]  =  (|C|/2)(of2  —  <r2  2)A/c1/2(62  —  1  4-  l/«)c>(6)/(2?r)3/2 
+  (|C|/2)(ar2  +  a2-2)A6<p(6)/(2T) 

+  -  o--1)(A/c)1/26^(6)/(2tt) 

+  (|5(?|/4)(<tJ"1  +  (Tj  1)Al/2d(6)/(2?r)1^2 
-f  t’(C’)  log(<r2 / er i  )/c1/2d>( 6)/(2tt )1/2 

+  ^(C)[l  -  $(6)].  (3.6) 


3.6  .V  =  3 

We  can  evaluate  (3.3)  by  expanding  and  using  (3.4): 

i’3E{A+He3[(6 +*,/*)/*>]}  =  E{A7[(6+A,A)3-3(l  +  l/«)(6  +  A,/«)]} 

=  K1/2(b3  -3 6  +  36//c)/(2tt)1/2  +  3(62  -  l)/2, 

so  that,  substituting  and  integrating  over  s.  we  get 

E(^v)  =  (|C|/3)e3(**-Jl,det(A)1/2[/c1/2(63  -  36  +  36/k)/(2;t)1/2  +  3(62  -  1)/2M6)/(2tt)3/2. 

It  is  hard  to  simplify  (3.5)  without  making  the  further  assumption  that  the  field  is  isotropic 
ir  t  for  fixed  s,  so  that  we  can  write  A  =  AI.  Then  conditional  on  X„  X  =  0  and  X  =  6 

E(det(DT  +  ctXxXXj.  <  0)]  =  e4'A2[(6  +  X./k)2  -  (1  +  1/k)]/2 
+trace(c-r)e3,A3/2(6  +  A,/k)/(2?t)1/2  +  det(cx)e2jA/2, 

so  that  conditional  on  X  =  0  and  X  =  6 

E[A'J+det(DT  +  ctAi)(Ax  <  0)]  =  e4*AV'2(62  -  1  +  l/«)/(2r)1/2  +  6]/2 
+trace(cx)e3*A3/2[6«1/,2/(2ff)^2  +  1/2]/(2tt)1/2  +  (det(cx)/2)e2*Afc1/2/(27r)1/2. 

Define  the  mean  curvature  of  dC  (see  Santalo,  1976,  p.  222)  to  be 

H(dC)  =  ^^trace(cx)dtx/2. 

Now  since  jdC  det(cx)dtx  =  4? r^(C)  we  get 

E(tfo)  =  /42{|5C,je2*A(«1^2(62  —  1  -j-  l//c)/(2ir)1^3  -j-  6]/2 

+  2i/(0C)e*A1'2[6K1/7(2jr)1/2  +  1/2]/(2tt)1/2  +  u>(C)(2irK)1/2}o(b)/(2*)ds 
=  {(|5C|/2)(e2**  -  e2'1  )A[k1/2(62  -  1  +  l/«)/(2?r)1/2  +  6]/2 

+  2 H(dC)(e‘*  -  e*1  )A1/2[6«1/2/(2tt)1/2  +  l/2]/(2r)I/2  +  tl>(C)(s7  -  Sl)(2nK)1/2}d>(b)/(2ir). 
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Finally,  from  Worsley  (1993b),  we  have 


E(V>b)  =  |C|e3'*A3/2(62-  l)^6)/(2;r)3/2  +  (|5C|/2)e2llA6©(6)/(27r) 
+  (//(3C)/7rjeJlA1/2^(6)/(27r)1/2  +  V(C)[1  -  *(6)1. 


Combining  these  and  substuting  ax  =  e-JJ,  <72  =  e-*1  for  the  limits  on  a.  we  have 


E[tf(i4*)]  =  |C!(l/3)(<rr3  -  <t2-3)A3'V/2(63  -  36  +  36/k)o(6)/(27t)2 
+  (|C|/2. )K~3  +  <r^3)A3/2(62  -  1)<£(6)/(2jt)3/2 
+  (|5C|/4)((7r2  -  (t2'2)A«1/2(62  -  1  +  l//c)<6(6)/(27r)3/2 
+  (i9C|/4)(o,f2  4-  cr2  2 )A6<6( 6)/ (2tt) 

+  [^(aO/rKar1  -  crf1)(A/c)1/26<i>(6)/(2ff) 

+  [//(dCl/MK1  +  OAvW)/(2*)!/2 

+  v(C)\og(cr2/<T1)Kl,26(b)/{2-)l/:i 

+  v(C)[l  -  *(6)]. 


(3-7) 


It  is  straightforward,  though  cumbersome,  to  generalise  to  piece-wise  smooth  sets  C.  Let 
dCs  be  the  smooth  part  of  dC,  and  let  8Ce  be  the  smooth  curves  or  “edges”  that  bound 
the  smooth  parts  of  dC.  Let  6  be  the  internal  angle  between  the  normals  to  the  two  parts 
of  dC s  on  either  side  of  8Ce,  and  let  tg  be  a  unit  vector  tangent  to  dC^.  Then  the  above 
results  will  hold  with  H(dC)  replaced  by 


H(dC)  =  (J£dC  trace(cx)dtT  +  j 2. 


If  dC  is  smooth  everywhere,  so  that  the  second  term  is  zero,  then  H(dC)  is  the  mean 
curvature  of  dC  as  before.  If  C  is  a  polyhedron,  so  that  the  first  term  is  zero,  then  H(dC) 
is  half  the  sum  of  the  lengths  of  the  edges  of  C  multiplied  by  their  angular  deficiency.  If 
C  is  convex,  let  A(dC)  be  the  average,  over  all  rotations,  of  the  maximum  perpendicular 
distance  between  two  parallel  planes  that  touch  dC\  this  is  known  in  stereology  as  the  mean 
caliper  diameter  of  C.  Then  it  can  be  shown  that  A(dC )  =  H(dC)/(2x)  (see  Santalo,  1976, 
p.  226).  Values  of  H(dC)  for  some  common  geometric  solids  are  given  by  Santalo  (1976,  p. 
229). 


3.7  Manifolds  in  1R3 

We  can  now  find  the  result  for  a  piece-wise  smooth  two  dimensional  manifold  C  embedded 
in  IR3  by  thickening  it  slightly,  applying  the  above,  and  taking  the  limit  as  the  thickness 
tends  to  zero.  ^From  (3.7)  we  see  that  the  result  is  identical  to  (3.6)  in  two  dimensions, 
obviously  a  special  case  when  C  is  flat.  Thus  no  matter  how  C  is  folded  or  even  creased,  the 
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expectation  of  the  Hadwiger  characteristic  is  the  same,  even  though  the  correlation  structure 
on  C  depends  on  the  folding  and  is  not  necessarily  stationary  unless  C  is  flat. 

If  C  is  a  piece-wise  smooth  two  dimensional  surface  homeomorphic  to  the  surface  of  a 
sphere,  \dC\  =  0  and  w{C)  =  2.  This  result  could  be  useful  for  directional  data,  such  as  the 
astronomical  example,  which  is  modelled  as  a  random  field  on  the  surface  of  a  sphere.  How¬ 
ever  the  fact  that  C  contains  a  "hole’’  means  that  E[v(>U)]  may  not  be  a  good  approximation 
to  P  { -Vm4x  >  6}  for  small  b. 

4  Volume  of  tubes 

An  alternative  to  the  methods  of  Section  3  is  the  differential  geometric  method  of  Weyl 
(1939).  See  also  Naiman  (1990),  Knowles  and  Siegmund  (1989),  Sun  (1993),  and  Siegmund 
and  Zhang  (1994).  As  will  become  apparent  below,  it  is  not  obvious  that  we  are  in  general 
evaluating  the  same  quantity  as  was  calculated  in  Section  3,  so  it  is  reassuring  to  obtain 
the  same  answer  in  particular  cases.  However,  our  main  goal  is  not  only  to  reproduce  the 
answer  by  different  means,  but  also  to  try  to  add  some  insight.  The  method  of  tubes  breaks 
the  complicated  calculation  into  a  large  number  of  small  pieces  whose  relative  importance 
can  to  some  extent  be  assessed  without  performing  the  entire  calculation.  Hence  the  method 
might  be  suitable  for  application  when  one  would  prefer  a  simple  approximate  result  to  a  more 
complicated  exact  one.  Alternatively,  since  each  of  the  pieces  has  its  own  geometric  meaning, 
the  method  illuminates  some  mysterious  expressions  obtained  by  the  formal  evaluations  of 
Section  3.  To  this  end  we  show  the  geometric  origin  of  the  somewhat  enigmatic  term  involving 
1/k,  which  appears  in  the  display  (3.6)  for  the  case  N  =  2  and  in  the  corresponding  formulas 
for  larger  values  of  N,  but  which  does  not  appear  in  (3.1)  for  N  =  1. 

Since  there  is  no  particular  advantage  in  reparameterizing  a  as  exp (— s)  as  in  Sections 
2  and  3,  we  consider  the  Gaussian  X(t,cr)  defined  in  (1.4),  which  has  the  Karhunen-Loeve 
expansion  (cf.  Loeve  (1963,  p.  478)): 

*(♦,*)  «!>(*•  »>Z*-  (4.1) 

k 

where  the  Z's  are  independent  standard  normal  and 

Cov(X(t1,<72),X(t2,<r2)]  =  £7fc(ti,<7ib*(t2,<72).  (4-2) 

k 

If  the  series  in  (4.1)  terminates  after  a  finite  number  of  terms,  the  following  discussion 
yields  a  very  precise  approximation  to  the  tail  of  the  distribution  of  when  £  =  0.  It 
presumably  applies  approximately  otherwise,  but  a  delicate  interchange  of  limits  is  involved. 
See  Sun  (1993)  for  a  detailed  discussion  of  the  nature  of  this  approximation  in  a  special  case. 


We  omit  discussion  of  these  technicalities  and  henceforth  proceed  formally  as  if  the  series 
(4.1)  contains  m  terms.  It  turns  out  that  ultimately  all  our  calculations  are  intrinsic  to  the 
manifold  M  defined  below,  so  the  quantities  we  evaluate  are  the  same  whether  or  not  the 
series  terminates  after  finitely  many  terms. 

If  (4.1)  is  divided  and  multiplied  by  the  norm  of  the  vector  Z  =  (Zit . . . ,  Zm)\  it  takes 
the  form  of  the  product  of  (7(t,rr),U)  and  ||Z||,  where  7(t,rr)  =  (“i(t, a ),..., 7m(t,<r))' 
U  is  uniformly  distributed  on  the  unit  sphere  in  m  dimensions  and  ||Z||  is  independently 
distributed  asa^  random  variable  with  m  degrees  of  freedom.  By  conditioning  on  ||Z||,  we 
can  reduce  the  problem  of  evaluating  the  tail  of  the  distribution  of  (1.5)  when  £  =  0  to  that 
of  evaluating 

P{max  (7(t,<r),U)  >  w}  (4.3) 

t.o 

for  values  of  w  close  to  one.  Moreover,  by  (4.2)  the  vector  7  defines  a  parameterized  subman¬ 
ifold  of  the  unit  sphere,  so  (4.3)  can  be  interpreted  geometrically  as  the  volume  of  the  tube 
of  geodesic  radius  cos'^tu)  about  this  manifold,  divided  by  the  volume  of  the  unit  sphere. 
We  seek  to  calculate  the  volume  of  that  tube. 

Let  M  =  {7(t,<r)  :  t  €  C,o\  <  a  <  <72}.  The  metric  tensor  (first  fundamental  form)  of 
the  manifold  M  is  given  by  the  (N  +  1)  x  (N  +  1)  matrix  with  entries  (7,,  7,),  where  the 
subscripts  denote  differentiation  with  respect  to  the  corresponding  argument.  Assuming  k 
is  symmetric  about  zero  in  each  of  its  arguments  and  invariant  under  permutation  of  its 
arguments,  we  see  from  the  covariance  calculations  of  Section  2  that  the  matrix  is  diagonal: 
the  first  N  diagonal  entries  equal  A/<72,  and  the  final  entry  is  x/er2.  It  will  be  useful  below  to 
note  that  in  the  special  case  A  =  k  =  1,  this  is  the  metric  tensor  of  hyperbolic  space  having 
constant  curvature  -1  (e.g.,  Boothby  (1986,  p.  404)). 

For  N  =  1,  Corollary  2  of  Knowles  and  Siegmund  (1989)  (corrected  for  minor  errors  of 
calculation)  yields 

P{max  (7(t,  <r),  Z)  >  i)  =  |A/|WW/(2*)  +  (|0A/|/2)^(*)/(2^)V*  +  X(.V)(1  -  *(i)][l  +  o(l)] 

(4.4) 

as  b  — ►  00.  Here  \M\  is  the  area  of  the  manifold,  |5M|  is  the  length  of  its  boundary,  and  x(Af) 
is  its  Euler- Poincare  characteristic  (equal  to  1  if,  as  we  assume  for  simplicity,  C  is  an  interval 
of  real  numbers).  In  the  present  case  the  area  element  of  the  manifold  is  (A ny^dtda/a2,  so 
|A/|  =  |C|(<rf 1  —  1)(A/c)1/2;  also  \dM\  =  |C|(<7f 1  -f-cf ') A1/2  -f  «1/f2log(<72/<Ti),  so  the  right 

hand  sides  of  (4.4)  and  (3.2)  are  the  same. 

It  is  interesting  to  note  that  in  the  derivation  of  (4.4)  the  Euler  characteristic  x{M)  arises 
via  an  application  of  the  Gauss-Bonnet  Theorem  as  (2tt)_1  times  the  sum  of  three  terms:  (i) 
the  Gaussian  curvature  of  the  manifold  integrated  with  respect  to  the  area  element,  (ii)  the 
geodesic  curvature  of  the  boundary  of  M  integrated  with  respect  to  arc  length,  and  (iii)  the 
sum  of  the  angles  through  which  the  tangent  to  the  boundary  of  M  rotates  at  the  comers. 
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See,  for  example,  Millman  and  Parker  (1977,  p.  185).  In  the  present  case  it  may  be  shown  by 
standard  elementary  calculations  based  on  the  metric  tensor  given  above  that  the  Gaussian 
curvature  is  the  constant  —  1  /«.  For  fixed  t,  as  a  function  of  a  the  curve  7(t,ff)  is  geodesic, 
so  its  geodesic  curvature  is  0.  For  a  =  <7„,  as  a  function  of  t  the  geodesic  curvature  of  7(t,<r„) 
is  (— provided  the  boundary  is  oriented  in  the  customary  counterclockwise  order 
in  the  (t,<r)  plane.  We  omit  the  details,  since  the  case  N  =  2  discussed  below  is  similar  in 
principle  and  more  complcated  in  detail.  Hence,  before  adding  the  integral  over  the  manifold 
of  the  Gaussian  curvature  and  the  integral  over  the  boundary  of  the  geodesic  curvature,  we 
have  obtained  two  terms  involving  1/«1/2,  which  then  sum  to  zero. 

We  now  consider  the  case  N  =  2.  As  the  arguments  of  Weyl  (1939)  show,  the  first  and 
second  order  terms  again  involve  the  volume  of  the  manifold  M  and  the  area  of  dM,  and 
are  easily  evaluated.  (See  Knowles  and  Siegmund  (1989)  or  Naiman  (1990)  for  details.)  For 
example,  in  the  present  case  the  volume  element  is  dV  =  X^^dt^dt^da /a3.  so 

I.V/I  =  |C|K:  -  (4.5) 

The  terms  in  b2<t>(b)  and  b<j>(b)  are  the  same  as  in  (3.6).  For  many  applications  these  terms 
will  provide  an  adequate  approximation  by  themselves. 

The  other  terms  involve  the  curvature  of  the  manifold,  the  curvature  of  the  boundary, 
and  lower  dimensional  boundary  corrections.  A  complete  derivation  involves  more  advanced 
tools  of  differential  geometry.  We  sketch  the  possibilities  here  by  deriving  the  term  involving 
l/*c  in  the  first  line  of  (3.6). 

By  the  methods  of  Knowles  and  Siegmund  (1989)  one  can  see  that  this  term  arises  from 

/  JdV  +  2  /  HadAd.  (4.6) 

JM  JdM 

The  expressions  in  (4.6)  are  as  follows.  Let  n„,  v  =  1,  •  •  • ,  m  —  N  —  1  be  mutually  orthogonal 
unit  normals  to  the  tangent  space  of  M.  Then  J  is  the  sum  over  v  of  the  sum  of  pairwise 
products  of  the  eigenvalues  of  the  Weingarten  map  (Millman  and  Parker  (1977,  p.  125)) 
in  the  direction  n„;  dV  is  the  volume  element  of  the  manifold;  H$  is  the  geodesic  mean 
curvature  of  the  boundary,  i.e.,  the  mean  curvature  in  the  direction  of  a  vector  which  is 
normal  to  the  boundary  of  M,  but  lying  in  the  tangent  space  of  M  pointing  into  the  interior 
of  M ;  and  dAa  is  the  area  element  of  the  boundary.  (The  expression  (4.6)  appears  in  a  brief 
discussion  of  the  case  of  a  three  dimensional  manifold  in  a  preliminary  version  of  Knowles 
and  Siegmund  (1989),  but  not  in  the  published  version.) 

Up  to  a  constant  multiple,  J  is  the  scalar  curvature  of  the  manifold.  It  can  be  identified 
as  such  and  laboriously  calculated  directly  from  the  metric  tensor  of  the  manifold.  See 
Sun  (1993)  for  a  discussion  and  summary  of  the  algorithm.  (For  a  manifold  of  the  simple 
structure  of  M,  we  suggest  a  different  method  below.)  The  result  is  J  =  —  3/«.  a  constant. 


so  by  (4.5) 


JdV  =  -3A«-1/2!C|(<7r2  -  ct~2)/2.  (4.7) 

We  now  consider  the  geodesic  mean  curvature  of  the  boundary.  It  turns  out  that  we  are 
primarily  interested  in  the  image  of  dC  x  {<ri,<r2},  since  the  rest  of  the  boundary  gives  rise 
to  one  of  the  other  terms  in  (3.6).  Hence  assume  that  a  is  fixed  and  consider  the  surface 
7(t,cr)  as  a  function  of  t  with  unit  normal  073/ k1/2.  From  the  covariance  calculations  in 
Section  2  it  follows  easily  that  (7,,,  73)  =  A/u3  for  i  =  1,2.  Hence  since  the  metric  tensor 
is  diagonal,  the  matrix  of  the  Weingarten  map  (Millroan  and  Parker  (1977,  p.  125))  has 
diagonal  entries  (7»,73)/(l!73||||7«||2)  =  [<rA/(#c1/2cr3)]/( A/cx2)  =  k~1/2,  so  2 H  =  2k_i/2.  In 
order  to  maintain  our  convention  that  H  be  calculated  with  respect  to  an  inward  pointing 
normal,  its  sign  should  be  negative  at  0  =  <72  and  positive  at  0  =  o\ .  Since  the  area  element 
is  A<r-2 dtidti,  it  follows  that  the  integral  of  twice  the  geodesic  mean  curvature  over  the  image 
of  dC  x  {0-1,02}  is 

J  2 HddA9  =  2AK~l/2|C|(<rf2  -  0 2'2).  (4.8) 

Addition  of  (4.7)  and  (4.8)  yields  the  coefficient  associated  with  the  1  /k  term  on  the  first 
line  of  (3.6). 

The  geodesic  mean  curvature  at  a  point  p  on  the  image  of  dC  x  (01,07)  leads  to  another 
term  in  (3.6).  For  simplicity  we  assume  that  C  has  a  smooth  boundary.  The  point  p  can  be 
represented  parametrically  by  p  —  a(t,o),  where  for  each  fixed  value  of  0,  as  a  function  of 
t,  or  is  a  unit  speed  curve.  Tangent  vectors  to  the  surface  a  are  dot/ 60  =  73  and  da/dt  = 
A_1/,2cr[7i  cos(0)+72  sin(0)].  The  relevant  (unit)  normal  is  n  =  A-1/2er[— 7!  sin(0)+72  cos(0)]. 
Obviously  6  is  the  angle  of  rotation  carrying  71,72  into  da/dt,  n.  By  differentiating  the 
relation  H73II2  =  k/o2  with  respect  to  t(  and  (73, 7i)  =  0  with  respect  to  0,  we  see  that 
(733*7 i)  =  0  for  i  =  1,2.  It  follows  that  one  diagonal  element  of  the  second  fundamental 
form,  and  since  the  metric  tensor  is  diagonal  one  diagonal  element  of  the  matrix  of  the 
Weingarten  map  is  zero.  The  other  diagonal  entry  is  (d2a/dt2,  n),  which  is  by  definition  the 
geodesic  curvature  of  the  curve  a  in  the  surface  7(t,<r)  considered  as  a  function  of  t  with  0 
fixed.  Hence  the  trace  of  Weingarten  map  is  2 He  =  kg,  the  geodesic  curvature  of  a.  The 
area  element  at  p  is  given  by  dAa  =  (*)l/2<7-ldtd<7.  Since  the  surface  7(t,a)  for  fixed  0  has 
as  metric  tensor  a  multiple  of  the  identity,  its  Gaussian  curvature  is  zero.  Hence  the  integral 
of  the  geodesic  curvature  of  the  boundary  with  respect  to  arc  length  is  2r  times  the  Euler 
characteristic  of  the  surface,  or  equivalently  the  Euler  characteristic  of  C.  Hence  the  integral 
of  2 Ha  over  this  part  of  dM  yields  2z,0(C)«1^2log(<72/<7i),  which  appears  in  the  third  line 
of  (3.6). 

The  use  of  differential  forms  simplifies  certain  calculations  associated  with  the  volume 
of  tubes.  It  is  a  particularly  attractive  alternative  in  the  present  case  of  a  diagonal  metric 
tensor,  since  it  allows  us  to  calculate  directly  and  easily  the  curvature  J  without  necessarily 
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identifying  it  as  a  multiple  of  the  scalar  curvature  of  the  manifold.  Although  these  calcula¬ 
tions  should  be  regarded  as  standard,  there  does  not  seem  to  be  a  convenient  reference,  and 
it  is  not  difficult  to  proceed  from  first  principles,  at  least  in  low  dimensional  problems.  A 
useful  general  reference  is  Boothby  (1986). 

Since  our  main  goal  here  is  insight,  we  shall  consider  the  case  of  a  tube  about  a  manifold 
embedded  in  m-dimensional  Euclidean  space.  The  added  features  of  the  calculation  for  a  tube 
about  a  manifold  embedded  in  the  unit  sphere  are  discussed  in  Knowles  and  Siegmund  (1989). 
Except  for  a  brief  remark  at  the  end  of  this  section,  we  shall  also  restrict  ourselves  to  the 
case  of  present  interest,  namely  a  three  dimensional  manifold.  Although  the  methods  work 
also  in  higher  dimensions,  the  computational  complications  can  become  quite  substantial. 

We  denote  by  e,,t  =  1.  •  •  • .  m  an  orthonormal  moving  frame  for  lRm.  with  the  first  three 
elements  a  frame  for  the  tangent  space  of  the  manifold  M  and  the  rest  orthogonal  to  M. 
Let  {u/}  and  {uj}  denote  the  dual  forms  and  connection  forms  respectively.  As  usual  we 
assume  the  u rs  are  restricted  to  the  tangent  space  of  the  manifold,  so  u*  =  •  •  •  =  um  =  0.  A 
point  on  the  manifold  satisfies  dp  =  £ fu/'e,.  Recall  also  that 

de,  =  (4.9) 

>=t 

=  (4.10) 

;=i 

and  uf  =  —  u£. 

For  all  sufficiently  small  a,  a  point  in  the  tube  of  radius  a  about  M  (except  for  points 
whose  closest  point  in  the  manifold  is  on  the  boundary  of  the  manifold)  can  be  parameterized 
as 

9  =  P  +  £  (4.11) 

4 

where  Y,tf  <  a2.  We  want  to  identify  1-forms  (u>\  =  1,* •  •  ,m}  such  that  dq  —  Y? ule.i- 
Then  the  volume  of  the  tube  can  be  obtained  by  integrating  the  volume  element 

dV \q)  =  w1  A  •  •  •  A  wm.  (4.12) 

Differentiating  (4.11)  and  applying  the  structural  equation  (4.9),  we  see  after  collecting 
terms  that  u>,  =  u tjuj  for  i  =  1,2,3  and  u,  =  dtt  —  YjLt  tj&j  for  4  <  j  <  m.  In 
taking  the  wedge  product  of  the  u>\  we  note  that  since  the  manifold  M  is  three  dimensional 
any  wedge  product  involving  more  than  3  among  the  various  u>‘  and  vanishes.  Hence  for 
4  <  i  <  m  the  sums  can  be  neglected.  It  follows  that  the  volume  element  (4.12) 

can  be  expanded  as  a  sum  of  wedge  products  of  the  following  form.  Each  term  contains 
dti  A  •  •  •  A  dtm.  One  term  contains  w1  A  •  •  •  A  u;3.  When  integrated  over  the  product  of  M 
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and  the  ball  of  radius  a.  it  yields  the  volume  of  the  manifold  multiplied  by  the  volume  of  the 
ball.  There  are  also  terms  containing  some  of  the  t}  raised  to  the  first  or  the  third  power. 
These  terms  vanish  when  integrated  over  the  bail  of  radius  a.  Finally  there  is  an  expression 
of  the  form 

m 

T  ^[u?1  A  ^  A  U3  +  Uj  A  w2  A  +  ufj  A  A  u>3]  A  dt4  A  •  •  •  A  dtm.  (4.13) 

Integration  of  (4.13)  over  the  ball  of  radius  a  yields  the  same  constant  for  all  values  of  j, 
so  there  remains  the  problem  of  evaluating  the  integral  over  the  manifold  of 

m 

J>2  Au^Au^  +  ^Au^Ao^+WiAo^A  u;3]-  (4-14) 

Since  this  is  a  3-form  on  a  3  dimensional  manifold,  it  must  be  a  scalar  multiple  of  the  volume 

element  uj1  A«2  Aw3.  From  w1  Aw2  Aw3(ei,e2,e3)  =  1  it  follows  that  to  evaluate  this  scalar, 
it  suffices  to  evaluate  (4.14)  at  (ei,  62,63),  and  since  u)'(ek)  =  6,.*,  this  evaluation  yields  a 
sum  over  j  of  a  sum  of  three  determinants  of  2  x  2  matrices  whose  i,  k  element  is  1 *4(e*).  In 
terms  of  the  curvature  forms 

n?  =  du>,fc-£w‘Auf  (4.15) 

i= 1 

( i,k  =  1,2,3)  (cf.  Boothby  (1986,  p.  390),  which  by  (4.10)  equals  -'E?=4uj  A  a^,  this 
evaluation  of  (4.14)  becomes 

-  E  «?(*.'*)•  (4.16) 

l<i<k<3 

Finally,  to  evaluate  (4.16),  we  follow  with  minor  changes  the  example  in  Boothby  (1986, 
p.  408),  which  deals  with  the  special  case  of  our  manifold  with  A  =  k  =  1.  We  find  that 
<*£  =  —  ^3.«wfe)  (*,&  =  1,2,3),  and  hence  Qf  =  —  wf  Ai^  =  k~1u>'  Aw*.  Hence 

(4.16)  equals  — 3/k,  which  is  exactly  the  value  of  J  given  above,  as  it  should  be. 

Remarks.  Evaluation  of  the  geodesic  mean  curvature  of  the  boundary  by  this  method  is  also 
easy.  For  example,  in  the  case  of  fixed  <r,  we  put  e3  =  73/H73II  (subject  to  a  correct  selection 
of  the  sign).  A  calculation  similar  to  that  given  above  shows  that  the  desired  quantity  is 
u1  A u>2  +  Wj  A  w2,  evaluated  at  ci,  e2-  This  equals  (62)  +  uff(e  1 )  =  2 x*x^2  =  2 H,  as  above. 

The  case  N  =  3  can  be  handled  similarly,  although  now  there  are  some  additional  terms 
to  consider.  The  most  interesting,  although  in  the  end  it  makes  no  contribution  to  the 
final  formula,  involves  the  Gauss-Bonnet  integrand  for  the  four  dimensional  manifold.  It 
arises  from  the  4-form  fi2  A  Cl*.  As  in  the  case  N  =  1,  where  the  integral  of  the  Gaussian 
curvature  over  the  manifold  is  exactly  canceled  by  the  integral  of  the  geodesic  curvature 
of  the  boundary,  the  integral  over  the  manifold  of  this  4-form  is  canceled  by  the  integral 
over  the  boundary  faces  where  a  =  <Ti  or  <72  of  a  three  form  involving  the  sum  of  twice 
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vjj  AujAwj  and  wj  A  fij  —  u;,  A  Hf  +  fi?  AuJ.  These  forms  are  defined  in  terms  of  the  frame 
t\,  •  •  • .  e4  adapted  to  the  manifold  with  e},  e2,  e3  spanning  the  tangent  space  of  the  boundary 
and  C4  =  74/||74||  perpendicular  to  the  boundary. 

5  Power 

Suppose  now  that  a  signal  is  present,  so  we  observe  the  field  satisfying 

dZ(t)  =  oZmtf[o ol(t  -  to)]*  +  dW(t), 

where  /  is  a  positive,  smooth  function,  to  is  the  unknown  location  of  the  signal,  er0  are 
nuisance  parameters  and  dW  is  white  noise.  After  smoothing  with  the  kernel  a“;v/2Jb[o,_l(h— 
t)],  where  k  is  chosen  to  equal  /  and  a  to  equal  <70  insofar  as  possible,  we  are  interested  in 
the  Gaussian  field 

A'(t,<r)  =  fc[o-1(h  -  t)]dZ(h),  (5.1) 

which  has  mean  value 

H(t,a)  =  ^c<tq)~ni'1  j  f[<To  ‘(h  -  t0)]fc[<r_1(h  -  t)]dh  (5.2) 

and  the  covariance  behavior  described  in  Section  2.  The  power  of  the  test  suggested  in 
Section  1  is 

?{Xmux  >  b }.  (5.3) 

Both  (5.2)  and  (5.3)  depend  on  the  unknown  parameters  £,t0,Oo,  although  this  dependency 
is  suppressed  in  the  notation.  If  the  range  of  <7  includes  the  true  value  <r0,  we  expect  most 
of  the  probability  (5.3)  to  arise  from  the  probability  that  Jf(t0,Oo)  exceeds  6,  which  equals 
1  —  $(b  —  /jo),  where  Ho  =  /j(to,o0)  and  $  is  the  standard  normal  distribution  function. 
Hence  we  begin  with  the  decomposition  of  (5.3)  as 

1  -*(6- W)+  f  P{J„  >  (5.4) 

We  consider  below  two  different  situations  regarding  the  choices  of  k  and  a:  (i)  k  =  / 
and  a  is  chosen  adaptively  as  discussed  in  Sections  2-4,  and  (ii)  k  and  a  are  arbitrary,  but 
fixed,  and  in  general  differ  from  /  and  <7o.  In  the  former  case  we  asstime  that  <7j  <  Oo  < 

We  begin  with  consideration  of  case  (i).  Note  that  in  this  case  /t0  =  £■  Also,  from  the 
form  (1.6)  of  the  likelihood  function  it  follows  that  X(to,oo)  >s  sufficient  for  £  and  hence 
the  conditional  probability  in  (5.4)  can  be  evaluated  assuming  that  £  =  0.  To  emphasize 
this  situation  we  write  P0  in  place  of  P.  Our  approximation  is  based  on  the  following 
considerations  that  have  proved  useful  in  related  contexts  (e.g.,  Siegmund  (1985,  pp.  202- 
203),  James,  James,  and  Siegmund  (1987)).  If  b  and  £  are  large,  then  large  values  of  the 
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sample  paths  of  X  are  likely  to  occur  near  to  t0,o0.  The  conditional  probability  in  (5.4)  will 
be  small  unless  X(t0,  oo)  is  close  to  6,  say  A/-(to,<70)  =  b—  y,  with  y  close  to  0.  Hence  if  the 
sample  paths  of  the  field  are  to  cross  the  level  b.  they  must  do  so  in  a  small  neighborhood  of 
the  value  to,  a0.  Moreover,  as  noted  by  various  authors  (e.g.,  Adler  (1981,  p.  157),  Aldous 
(1989,  p.  65),  Leadbetter,  Lindgren  Rootzen  (1983.  pp.  201  ff.),  if  X  takes  on  a  large 
value  at  t0,  <r0,  then  in  a  small  neighborhood  of  that  point  it  will  behave  very  much  like  a 
quadratic  function  whose  random  part  will  come  from  the  linear  term.  This  suggests  the 
local  approximation 

A'(t, <r)  «  (6  -  y)  +  u'A(to,<To)  +  u'Eo{A'(t0,<T0)|X(to, cr0)  =  6  -  y]u/2,  (5.5) 

where  u  =  ((t  —  t0)',  cr  —  £r0)'.  The  right  hand  side  of  (5.5)  will  exceed  b  for  some  value  of  u 

if  and  only  if  its  maximum  does.  A  straightforward  maximization  of  the  right  hand  side  of 
(5.5)  yields 

b-y  -  A'(t0,ao)'{Eo[A'(to,o0)|A'(to,<ro)  =  b  -  y]}_1A'(to,<r0)/2,  (5.6) 

which  after  an  evaluation  of  the  indicated  conditional  expectation  of  the  Hessian  of  X  yields 

b  -  y  +  X(t0,  <t0),E-1X( to,  cx0>/[2(6  -  y)],  (5.7) 

where 

E  =  -Eo[X(t0,ao)X(to,<ro)]  =  Eo[X(to,oo)X(to,<ro)'].  (5.8) 

The  quadratic  form  in  (5.7)  is  independent  of  X(to,<7o)  and  is  distributed  as  a  \2  random 
variable  with  n  =  N  +  1  degrees  of  freedom.  Hence  the  conditional  probability  in  (5.4)  is 
approximately,  in  the  obvious  notation, 

P{Xn  >  2y(6-y)}.  (5.9) 

Since  we  are  assuming  that  b  is  large  and  expect  that  the  important  values  of  y  are  close 
to  0,  we  propose  to  replace  b  —  y  by  b  in  (5.9)  when  we  substitute  into  (5.4).  Likewise  we 
neglect  the  term  involving  y2  in  <f>(b  —  £  -  y).  Then  the  integral  in  (5.4)  takes  the  form  of  a 
Laplace  transform,  which  after  an  integration  by  parts  is  easily  shown  to  equal 

1  _  *(»  -  o  +  <0(6  -  f)[l  -  WtfnV((  -  b).  (5.10) 

Remarks.  Except  for  the  smoothness  and  behavior  for  large  t  necessary  to  permit  the 
expansion  (5.5)  and  the  integration  by  parts  implicit  in  (5.8),  there  are  essentially  no  con¬ 
ditions  imposed  on  /.  It  appears  that  with  a  more  careful  analysis  the  preceding  argument 
requires  that  X  be  only  once  continuously  differentiable,  not  twice  differentiable,  although 
the  stronger  assumption  is  satisfied  in  the  applications  discussed  below. 
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We  now  turn  to  case  (ii).  Although  the  conditional  probability  in  (5.4)  now  depends 
on  the  underlying  parameters,  the  argument  given  above  continues  to  hold  with  suitable 
modifications.  In  particular,  in  the  following  analogue  of  (5.5)  the  gradient  vector  X  and 
Hessian  matrix  X  involve  differentiation  with  respect  to  the  N  spatial  variables,  but  not  the 
scale  factor  <7,  which  is  constant.  Instead  of  (5.5)  we  have 

X(t,  a)  «  6-  y  +  (t  -  to)'AT(to,  a)  +  (t  -  t0)'E[A(t0,  a)\X(t0,<r)  =  b  -  yj(t  -  to)/2.  (5.11) 
Also 

E[X\X]  =  E[X]  -  Cov[A\  A')E(A')  +  Cov[X,X]X,  (5.12) 

but  unlike  case  (i)  above  the  sum  of  the  first  two  terms  does  not  vanish.  We  shall  assume 
that  /  and  k  are  symmetric  about  zero  in  each  argument  and  invariant  under  permutations 
of  the  arguments,  so  that  the  right  hand  side  of  (5.12)  is  the  product  of  the  identity  matrix 
and  a  scalar  of  the  form 

£if  +  A<y-2A(t0,  cr).  (5.13) 

Here  £  and  A  are  as  defined  above,  and  T)  is  easily  evaluated  in  terms  of  integrals  involving 
k,  /,  and  their  partial  derivatives.  We  omit  the  general  expression.  In  the  special  case  that 
both  k  and  /  are  the  Gaussian  kernels  (1.3) 

V  =  (2<r2)-'[2 <W(<rJ  +  ^)}N/2[^2  -  <^)/V  +  *2)].  (5.14) 

The  right  hand  side  of  (5.11)  can  be  maximized  as  above,  and  the  resulting  quadratic  form 
is  a  multiple  of  a  \2  random  variable  with  N  degrees  of  freedom.  This  can  be  substituted 
into  (5.4)  and  integrated  approximately  as  above  to  yield  as  an  approximation  to  the  power 
in  case  (ii) 

1  -$(b- fi0)  +  {no-b)~1<t>(b- no){l  -[{h-\-<r2ivl\)l{no-\-a2ir]IX)}NI2}.  (5.15) 

For  the  special  case  of  Gaussian  kernels,  77  is  given  by  (5.14),  A  =  1/2,  and 

Ho  =  ([2<7<to/ (a2  +  <tI)]NI2.  (5.16) 

Note  that  (5.10)  and  (5.15)  are  the  same  if  <r  =  <r0  and  N  =  n. 

We  have  investigated  the  numerical  accuracy  of  the  approximation  (5.15)  for  the  special 
case  N  =  1,<7  =  <7o.  In  one  dimension  it  is  possible  to  use  the  expected  number  of  upcross- 
ings  of  the  process  X  to  give  a  tight  upper  bound  on  the  conditional  probability  in  (5.4), 
which  can  then  be  integrated  numerically  to  give  a  tight  upper  bound  on  (5.4)  (cf.  Davies 
(1977),  Knowles,  Siegmund,  and  Zhang  (1991)).  From  such  a  comparison,  it  appears  that 
the  x2  approximation  for  the  conditional  probability  is  very  good  for  small  y,  although  it  de¬ 
teriorates  for  larger  values.  Over  a  wide  range  of  values  for  b  and  £  the  upper  bound  and  the 
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approximation  (5.15)  are  in  quite  good  agreement-  usually  to  the  first  two  significant  figures. 
We  also  tried  substituting  the  \;2  approximation  for  the  conditional  probability  directly  into 

(5.4)  and  integrating  numerically  with  essentially  the  same  results. 

An  interesting  application  of  the  preceding  calculations  is  to  compare  the  efficiency  of 
adaptive  choice  of  the  scale  parameter  <r,  as  suggested  above,  with  an  arbitrary  fixed  (but 
frequently  incorrect)  value.  For  am  example  we  suppose  that  N  =  2.  We  also  assume  that 
both  /  and  k  are  Gaussian  and  that  C  is  a  T  x  T  square  with  T  =  100.  (This  is  roughly  the 
size  of  the  region  corresponding  to  the  astronomical  example  mentioned  in  the  introduction, 
although  in  that  case  the  shape  and  hence  the  boundary  of  the  region  C  are  different.)  Under 
these  assumptions,  when  £  =  0,  an  approximation  to  the  probability  that  maxtX(t,<r)  >  b 
is 

[|C|6/(4ir<r2)  +  |dC|/(4ff,/2er)]<?>(fe).  (5.17) 

Assume  that  oq  is  thought  to  equal  1.  but  this  value  may  be  in  error  to  the  extent  that  the 
true  value  may  be  as  small  as  0.5  or  as  large  as  2.0.  For  the  test  based  on  a  fixed  choice  of 
a.  it  is  easy  to  see  from  (5.15)  that  one  makes  a  slightly  more  serious  error  by  choosing  a 
larger  than  tr0  than  by  choosing  it  smaller.  For  significance  level  0.05  and  power  0.9  some 
experimentation  shows  that  the  choice  a  «  0.9  minimizes  the  maximum  value  of  £  necessary 
to  obtain  the  desired  power  at  both  <Tq  =  0.5  and  2.0.  In  particular  6  =  4.58  and  £  =  6.8 
satisfy  these  requirements. 

Suppose,  on  the  other  hand  we  search  adaptively  for  <r0  over  the  range  (0.33,  3.0).  By 

(3.5)  the  value  of  b  yielding  a  significance  level  of  0.05  is  6  «  5.1.  Assuming  that  the  value 
of  £  is  proportional  to  the  square  root  of  the  sample  size,  we  might  reasonably  measure 
the  relative  efficiency  of  these  two  procedures  by  the  square  of  the  ratio  of  the  values  of 
£  necessary  to  obtain  a  given  power.  For  a  power  of  0.9,  at  cr0  =  1.0,  the  non-adaptive 
procedure  is  about  16%  more  efficient,  while  at  <7o  =  0.5  or  2.0  the  adaptive  procedure  is 
about  22%  more  efficient.  The  range  of  values  of  (r0  over  which  the  non-adaptive  procedure 
is  more  efficient  is  from  about  .63  to  about  1.5.  Similar  results  hold  at  power  of  0.80  and 
0.95. 

As  one  might  expect  from  the  form  of  (5.16),  these  relative  efficiencies  are  larger  in  higher 
dimensions.  For  a  similar  scenario  with  jV  =  3,  the  adaptive  procedure  is  about  33%  more 
efficient  when  <r0  is  misspecified  by  a  factor  of  2,  whereas  the  nonadaptive  procedure  is  about 
25%  more  efficient  when  <70  =  1. 
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6  Example  and  Application 

6.1  Example 

We  illustrate  the  methods  in  this  paper  on  some  simulated  data  for  N  =  1.  Gaussian  white 
noise  was  simulated  on  the  interval  [-40.40]  and  smoothed  for  t  €  C  =  [—10.10]  using  the 
Gaussian  kernel  (1.3)  for  a  €  /  =  [0.2. 5.0].  This  was  repeated  for  the  same  Gaussian  white 
noise  plus  a  Gaussian  signal  located  at  to  =  0  with  scale  <70  =  1  and  height  (  =  6.  The 
results  are  shown  in  Figure  1  for  the  lower  limit  o\  —  0.2,  the  upper  limit  <r2  =  5,  and  the 
true  scale  <r0  =  1.  Figure  2  shows  the  same  data  for  all  a  plotted  on  the  log  scale  for  <r,  so 
that  the  field  is  stationary  in  separate  horizontal  and  vertical  directions  (but  not  jointly). 

The  test  statistic  is  X _ =  2.24  for  noise  only,  and  =  6.92  for  signal  plus  noise;  in  the 

latter  case  the  maximum  is  located  at  t  =  —0.16  and  a  =  1.18,  close  to  the  location  and 
scale  of  the  signal.  Adler  (1981.  p.  117  ff.)  gives  a  simple  and  fast  way  of  approximating 
the  Hadwiger  characteristic  from  equally  spaced  sample  data,  and  we  plot  this  against  the 
threshold  fa  in  Figure  3.  It  is  in  reasonable  agreement  with  the  expected  value  (3.2)  for  the 
noise  only  data.  The  approximate  level  0.05  critical  value  for  Xm.,  is  3.40,  found  by  equating 
(3.2)  to  0.05  and  solving  for  fa.  No  signal  is  detected  in  the  noise  only  data,  but  the  signal  is 
detected  in  the  signal  plus  noise  data.  The  contributions  of  the  individual  terms  in  (3.2)  at 
fa  =  3.40  are  0.032  for  the  first  “interior”  term,  0.018  for  the  second  ‘location’  term,  0.001  for 
the  third  “scale”  term,  and  P{A"  >  3.40}  =  0.0003  for  the  last  term.  Thus  in  this  example 
the  “scale”  corrections  to  the  first  term  are  an  important  part  (38%)  of  the  approximate 
probability.  However  the  contribution  from  the  sides  is  relatively  unimportant,  despite  the 
25-fold  range  of  kernel  scale;  this  can  be  seen  from  Figure  2,  where  the  image  appears  a  lot 
smoother  in  the  vertical  direction  than  along  the  base,  so  that  the  excursion  set  touches  the 
sides  a  lot  less  than  the  base.  If  we  use  the  data  along  the  base  only,  that  is  we  restrict  a 
to  o\  =  0.2,  then  the  maximum  X(t,oi)  is  unchanged  for  noise  only  and  still  high  (5.07) 
for  signal  plus  noise;  the  approximate  0.05  critical  value  is  slightly  lower  at  3.30  and  we 
reach  the  same  conclusion  as  before.  Thus  in  this  case  searching  over  kernel  scales  does  not 
appreciably  alter  the  analysis. 

6.2  Application 

We  apply  our  results  to  N  =  3  dimensional  medical  images.  Talbot  et  al.  (1991)  carried  out 
an  experiment  in  which  PET  cerebral  blood  flow  images  were  obtained  for  9  subjects  while  a 
thermistor  was  applied  to  the  forearm  at  both  warm  (37°C)  and  hot  (48°C)  temperaratures, 
each  condition  being  studied  twice  on  each  subject.  The  purpose  of  the  experiment  was  to 
find  regions  of  the  brain  that  were  activated  by  the  hot  stimulus,  compared  to  the  warm 
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stimulus.  Individual  images  were  aligned  and  sampled  on  a  128  x  128  x  80  lattice  of  voxels, 
separated  at  approximately  1.4mm.  1.7mm  and  1.5mm  on  the  front-back,  left-right  and 
vertical  axes,  respectively.  For  the  present  work,  we  analysed  the  difference  images  of  the 
two  warm  conditions  as  a  dataset  which  should  have  an  expectation  of  zero  throughout.  We 
also  analysed  the  difference  between  the  average  of  the  two  hot  conditions  and  the  average 
of  the  two  warm  conditions  to  search  for  activation  due  to  the  painful  heat  stimulus.  These 
difference  images  were  averaged  over  subjects  to  increase  the  signal  to  noise  ratio.  Worsley  et 
al.  (1992)  estimated  the  voxel  standard  deviation  by  pooling  the  voxel  sample  variance  (with 
8  degrees  of  freedom)  over  all  voxels.  A  normalized  image  was  produced  by  dividing  each 
voxel  by  the  pooled  standard  deviation  and  multiplying  by  the  square  root  of  the  number  of 
subjects. 

In  PET  imaging  the  resolution  of  the  image  is  determined  by  the  “point  response  func¬ 
tion.”  which  is  measured  by  placing  a  point  source  of  isotope  in  the  PET  camera  and 
measuring  the  response.  It  can  be  reasonably  approximated  by  a  Gaussian  kernel  with 
a  =  2.87mm.  Worsley  et  al.  (1992)  found  that  the  distribution  of  the  noise  component 
of  the  normalized  image  is  well  approximated  by  that  of  a  stationary  white  noise  Gaussian 
random  field  smoothed  with  the  point  response  function.  We  are  thus  able  to  observe  the 
smoothed  process  X  for  a  =  2.87mm  but  not  the  unsmoothed  process  Z.  If  the  value 
<70  in  the  unobserved  model  (1.1)  exceeds  <J\  =  2.87  (and  the  signal  shape  is  Gaussian), 
it  would  be  appropriate  to  introduce  additional  smoothing  by  a  Gaussian  kernel  with  scale 
a  —  (o‘o~°’i)1/2-  Since  <r0  is  unknown,  the  normalized  image  was  smoothed  with  seven  Gaus¬ 
sian  kernels  ranging  in  scale  from  zero  to  14.0mm  so  that  the  scale  ranged  from  ay  =  2.87mm 
to  <r2  =  (<7j  + 14.02)1/2  =  14.3mm.  The  Gaussian  kernels  were  chosen  so  that  the  the  resulting 
log( a)  values  were  uniformly  spaced  on  (log(cri),  log(cr2)].  The  choice  of  seven  kernels  gave 
a  sampling  interval  to  resolution  ratio  on  the  log( or)  scale  that  approximated  the  sampling 
interval  to  resolution  ratio  of  locations  in  the  highest  resolution  {a  —  <7i)  image. 

The  region  of  the  brain  C  of  interest  was  chosen  to  be  a  hemisphere  with  radius  r=7cm 
covering  the  top  part  of  the  brain.  The  test  statistic  was  Xmtix  =  4.05  for  noise  only, 
and  Xn.v  =  7.47  for  signal  plus  noise;  in  the  latter  case  the  maximum  was  located  at 
cr  =  <r2  =  14.3mm  in  the  anterior  cingulate/ supplement  ary  motor  area,  plus  three  other  local 
maxima  all  with  X(t,<7)  >  5.  The  volume  of  C  was  \C\  =  (2/3)?rr3  =  718cm3,  its  area  was 
\dC\  —  2tct2  +  7rr2  =  462cm2,  and  its  mean  caliper  diameter  was  A(dC)  =  r  +  7rr/4  =  12.5cm 
giving  H(dC)  =  79cm.  The  observed  Hadwiger  characteristic  is  plotted  against  the  threshold 
b  in  Figure  4,  together  with  the  expected  value  (3.7).  There  is  reasonable  agreement  for  the 
noise  only  data,  but  when  the  signal  is  present  there  are  some  discrepancies  in  the  upper 
tail.  The  approximate  level  0.05  critical  value  for  L„  is  4.92,  found  by  equating  (3.7)  to 
0.05  and  solving  for  6,  so  that  no  signal  is  detected  in  the  noise  only  data,  but  the  signal 
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is  detected  in  the  signal  plus  noise  data.  The  contributions  of  the  individual  terms  in  (3.2) 
at  6  =  4.92  are  0.028  for  the  first  “interior*’  term,  0.018  for  the  second  ‘location*  term,  and 
0.0028  and  0.0013  for  the  third  and  fourth  ‘scale*  terms;  the  rest  are  below  0.0001.  As  in 
the  N  =  1  example  the  "edge"  corrections  to  the  first  term  are  an  important  part  (43%)  of 
the  approximate  probability,  but  the  contribution  from  the  scale  is  relatively  unimportant. 
If  we  use  the  data  along  the  base  only,  that  is  we  restrict  <7  to  —  2.87mm,  then  the 
maximum  X(t,cri)  is  unchanged  for  noise  only  but  much  lower  (4.89)  for  signal  plus  noise; 
the  approximate  0.05  critical  value  is  slightly  lower  at  4.86  and  so  we  fail  to  detect  a  signal 
in  the  noise  only  data  set  and  we  just  detect  a  signal  in  the  signal  plus  noise  data  set.  This 
nicely  illustrates  the  advantages  of  our  approach  of  searching  over  kernel  scales  as  well  as 
locations. 


7  Discussion  and  Open  Problems 

There  remain  a  number  of  related  problems,  which  we  discuss  briefly  below. 

7.1  Confidence  Regions 

When  a  signal  is  present,  i.e.,  £  >  0,  we  may  want  to  estimate  its  location  and  size,  say  by 
a  confidence  region.  This  problem  is  related  to  finding  confidence  sets  for  a  change- point, 
and  we  can  make  use  of  ideas  appearing  in  that  literature.  For  a  review  of  a  number  of 
possibilities  see  Siegmund  (1988a)  and  references  cited  there.  For  example,  suppose  that  we 
know  the  signal  shape  /.  Then  from  the  form  (1.6)  for  the  likelihood  function,  it  follows 
that  a  1  —  a  conditional  likelihood  ratio  confidence  region  for  t0,  cr0  is  given  by 


{(to,oo)  :  AT(to,<7o)  ^  Xmrnx  c} , 
where  c  =  c[A*(to,<ro)]  is  such  that 

P{*nuu  >  X(to,ao)  +  c|X(to,<7o)}  =  a. 

Since  X(to,  <7o)  is  sufficient  for  £,  the  conditional  probability  does  not  depend  on  the  unknown 
nuisance  parameter. 

The  same  conditional  probability  appears  in  (5.4),  so  the  method  suggested  in  Section 
5  for  evaluating  this  probability  is  applicable.  The  only  difference  is  that  when  we  come  to 
assess  the  accuracy  of  the  approximation  given  in  (5.9),  we  are  now  interested  in  the  con¬ 
ditional  probability  itself,  and  particularly  in  values  of  b  and  y  which  make  the  conditional 
probability  small.  Since  the  power  calculations  in  Section  5  and  hence  the  argument  given 
there  involve  values  of  b  and  y  for  which  the  conditional  probability  is  large,  and  since  the 
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approximation  (5.9)  is  not  even  monotonic  in  y  when  y  is  a  substantial  fraction  of  6,  it  is 
not  clear  whether  the  same  approximation  will  work  here.  We  have  performed  numerical 
calculations  assuming  that  N  =  1  and  ctq  is  known  to  compare  conditional  likelihood  ra¬ 
tio  confidence  regions  based  on  the  approximation  (5.9)  with  the  presumed  gold  standard: 
numerical  calculation  of  the  expected  number  of  upcrossings  (cf.  Knowles,  Siegmund,  and 
Zhang  (1991)).  For  Xmtx  =  5.0,  4.0,  and  3.5.  we  find  that  the  two  approximations  give 
essentially  the  same  0.95  confidence  regions.  At  5.0  they  also  give  essentially  the  same  0.99 
regions,  but  at  4.5  the  regions  differ  somewhat  and  at  3.5  the  use  of  the  approximation  (5.9) 
breaks  down. 

It  would  be  interesting  to  study  this  issue  in  more  detail  and  give  an  approximation  to 
the  conditional  probability  in  (5.4)  that  is  more  generally  accurate  when  that  probability  is 
small. 

7.2  Box  Shaped  Kernels  and  Signals 

In  some  cases  it  might  be  thought  appropriate  to  use  a  discontinuous  kernel,  either  because  it 
appears  that  the  signal  itself  is  discontinuous  or  as  a  convenient  approximation.  An  example 
would  be  the  box  shaped  kernel  k(x)  =  1  if  max|x,|  <1/2  and  0  otherwise.  A  systematic 
study  of  this  case  is  beyond  the  scope  of  the  present  paper.  To  illustrate  briefly  some  of  the 
essential  features,  we  suppose  that  cr0  is  known  and  for  definiteness  that  N  —  2.  Then  if  C 
is  large  relative  to  <7,  so  edge  effects  can  be  neglected,  we  find  by  the  methods  of  Siegmund 
(1988b)  or  Loader  (1991)  that  when  £  =  0, 

P{maxtX(t,cr)>b}ss(\C\/<72)b3<i>{b).  (7.1) 

Because  of  the  extra  roughness  in  the  sample  paths  of  X(t,  a)  in  this  case,  the  tail  probability 
involves  two  extra  powers  of  6  in  comparison  with  (5.17)  and  even  one  power  more  than  (3.5), 
where  we  also  search  for  the  correct  a0.  (Another  consequence  of  this  sample  path  roughness 
is  that  in  most  cases  one  will  also  want  to  make  a  correction  for  the  inevitable  discrete  grid 
of  observations,  for  which  the  model  of  continuously  observed  white  noise  usually  does  not 
provide  an  adequate  approximation.  We  omit  discussion  of  this  aspect  of  the  problem.  See, 
for  example,  Siegmund  (1988b).) 

A  substantial  elaboration  of  the  argument  of  James,  James,  and  Siegmund  (1987)  shows 
that  if  in  this  case  /  =  k  and  o  =  &o,  the  power  of  the  likelihood  ratio  test  is  given 
approximately  by 

1  -  *(6 -  {)  +  ®(6 -  0{[46J(i  +  {)  +  (\3b  +  f)l /({(I  +  0!!}-  (7-2) 

It  is  interesting  to  consider  the  possibility  that  we  use  a  box  shaped  kernel  when  the 
signal  is  smooth  or  conversely  a  smooth  kernel  when  the  signal  is  box  shaped.  Suppose, 
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for  example,  that  /  is  Gaussian  but  one  uses  a  box  shaped  kernel.  It  is  easy  to  see  that 
the  expectation  of  X(t 0,c)  is  maximized  by  taking  a  about  equal  to  2.8<70  and  that  the 
maximum  value  is  about  fi0  =  0.94355^  x  £  =  0.89£.  Solely  on  the  basis  of  this  effect  on  the 
non-centrality  parameter,  one  would  expect  the  relative  efficiency  of  the  box  shaped  kernel 
to  the  Gaussian  kernel  to  be  about  80%.  However,  this  analysis  neglects  the  effect  of  the 
different  kernels  on  the  threshold  b  and  the  contribution  of  the  second  term  in  (5.4)  to  the 
power  of  the  test.  For  a  0.05  test  for  the  same  100  x  100  square  in  the  example  at  the  end 
of  Section  5,  the  threshold  with  a  box  shaped  kernel  and  a  =  2.8,  <7o  =  1  is  about  b  =  5.34 
compared  to  b  =  4.53  when  using  a  Gaussian  kernel  with  a  —  <r0  =  1  (cf.  (5.17)).  This 
difference  suggests  the  relative  efficiency  of  the  box  shaped  kernel  is  even  less.  Although 
greater  fluctuations  in  the  sample  paths  of  the  observed  process  when  a  box  shaped  kernel 
is  used  lead  one  to  expect  the  contributions  of  the  second  term  in  (5.17)  to  be  larger,  even 
without  evaluating  this  term  it  seems  reasonable  to  conclude  that  use  of  the  box  shaped 
kernel  when  the  signal  is  Gaussian  leads  to  a  substantial  loss  of  efficiency. 

There  seems  to  be  a  smaller  loss  of  efficiency  if  the  signal  is  actually  box  shaped  but 
one  uses  a  Gaussian  kernel.  For  a  numerical  example,  suppose  that  C  is  again  a  100  x  100 
square  and  that  /  is  box  shaped  with  a0  =  1.  From  (7.1)  and  (7.2)  it  follows  that  for  b 
=  5.75  we  have  an  approximately  0.05  level  test  with  power  0.9  at  £  =  6.49.  If  we  use  a 
Gaussian  kernel,  the  signal  to  noise  ratio  at  to  is  maximized  at  approximately  a  —  <7o/2.8. 
An  approximately  0.05  level  test  is  obtained  by  taking  b  —  4.99.  It  may  be  shown  that  (5.15) 
with  tj  =  0  provides  an  approximation  for  the  po-  Ter  of  the  test,  so  power  of  about  0.9  is 
attained  at  £  =  6.84.  This  corresponds  to  a  relative  efficiency  of  about  90%. 

7.3  Adaptation  on  Additional  Parameters 

The  principal  points  of  this  paper  are  that  in  using  a  spherical  Gaussian  kernel,  one  can 
determine  appropriate  rejection  thresholds  when  the  scale  parameter  a  is  chosen  adaptively; 
and  if  the  signal  is  itself  spherical  Gaussian,  adaptive  choice  of  a  seems  a  reasonable  strategy 
with  regard  to  power.  It  seems  relatively  straightforward  to  generalize  most  of  our  results  to 
allow  for  different  values  of  a  in  different  coordinate  directions,  i.e.,  to  allow  for  an  elliptical 
Gaussian  kernel  oriented  in  an  arbitrary  but  data  independent  direction.  However,  we  do 
not  know  how  much  power  is  gained  or  lost  by  this  extra  degree  of  freedom. 

A  more  challenging  problem  is  to  allow  the  Gaussian  kernel  also  to  have  arbitrary  covari¬ 
ances,  so  the  data  determine  the  spatial  orientation  of  the  elliptically  contoured  Gaussian 
hill. 
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7.4  Numerical  Considerations  and  Unsolved  Mathematical  Prob¬ 
lems 

As  suggested  in  (4.4),  we  conjecture  that  there  exists  a  precise  mathematical  theorem,  which 
says  that  the  quantities  we  have  calculated  in  Sections  3  and  4  give  the  correct  tail  behavior 
of  the  maxima  of  these  Gaussian  fields  as  b  — »  oc  to  the  number  of  terms  given.  Previous 
attempts  in  this  direction  seem  either  to  deal  with  very  general  processes,  but  only  first 
order  asymptotic  behavior  (e.g.,  Adler,  1981,  section  6.9,  pp.  159-167),  or  as  in  Sun  (1993)  a 
special  Gaussian  field  and  second  order  asymptotic  behavior.  An  interesting  point  of  Sun’s 
research  was  the  numerical  observation  that  much  better  agreement  between  theory  and 
simulations  was  obtained  when  the  second  order  terms  are  included,  but  in  her  work  the 
formulas  were  too  complicated  to  comprehend  the  size  of  these  second  order  effects  except 
through  the  numerical  values  they  produced.  In  equations  (3.6)  and  (3.7)  it  is  clear  that  one 
often  must  include  the  top  three  orders  of  magnitude  before  the  remaining  terms  become 
insignificant  numerically. 

Giving  a  precise  mathematical  relation  between  the  quantities  calculated  in  Section  3 
and  the  tail  probability  of  the  maximum  of  the  Gaussian  field  seems  an  interesting  and 
challenging  mathematical  problem. 

It  would  be  interesting  to  derive  a  general  lower  bound  on  the  constant  /c,  so  that  one 
could  see  generally  whether  the  terms  involving  1/k  in  (3.6)  and  (3.7)  might  dominate  the 
higher  order  terms  in  these  expressions,  but  we  have  been  unable  to  do  so. 
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Figure  1  (b)  Smoothed  signal  +  noise 
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Figure  2(b)  Smoothed  signal  +  noise 


0'S  0’S  O'Z 


Hadwiger  characteristic 


Figure  4  Hadwiger  characteristic,  PET  data 
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SUMMARY 


We  suppose  that  our  observations  can  be  decomposed  into  a  fixed  signal  plus  random 
noise,  where  the  noise  is  modelled  as  a  particular  stationary  Gaussian  random  field  in  N- 
dimensional  Euclidean  space.  The  signal  has  the  form  of  a  known  function  centered  at  an 
unknown  location  and  multiplied  by  an  unknown  amplitude,  and  we  are  primarily  interested 
in  a  test  to  detect  such  a  signal.  There  are  many  examples  where  the  signal  scale  or  width 
is  assumed  known,  and  the  test  is  based  on  maximising  a  Gaussian  random  field  over  all 
locations  in  a  subset  of  .V-dimensional  Euclidean  space.  The  novel  feature  of  this  work  is 
that  the  width  of  the  signal  is  also  unknown  and  the  test  is  based  on  maximising  a  Gaussian 
random  field  in  N  +  1-dimensions.  N  dimensions  for  the  location  plus  one  dimension  for  the 
width.  Two  convergent  approaches  are  used  to  approximate  the  null  distribution:  one  based 
on  the  method  of  Knowles  and  Siegmund  (1989),  which  uses  a  version  of  Weyl's  ( 1939)  tube 
formula  for  manifolds  with  boundaries,  and  the  other  based  on  some  recent  work  by  Worslev 
(1993b).  which  uses  the  Hadwiger  characteristic  of  excursion  sets  as  introduced  by  Adler 
(1981).  Finally  we  compare  the  power  of  our  method  with  one  based  on  a  fixed  but  perhaps 
incorrect  signal  width. 


